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Abstract —Good results on image classification and retrieval 
using support vector machines (SVM) with local binary patterns 
(LBPs) as features have been extensively reported in the literature 
where an entire image is retrieved or classified. In contrast, in 
medical imaging, not all parts of the image may be equally 
significant or relevant to the image retrieval application at 
hand. For instance, in lung x-ray image, the lung region may 
contain a tumour, hence being highly significant whereas the 
surrounding area does not contain significant information from 
medical diagnosis perspective. In this paper, we propose to detect 
salient regions of images during training and fold the data to 
reduce the effect of irrelevant regions. As a result, smaller image 
areas will be used for LBP features calculation and consequently 
classification by SVM. We use IRMA 2009 dataset with 14,410 x- 
ray images to verify the performance of the proposed approach. 
The results demonstrate the benefits of saliency-based folding 
approach that delivers comparable classification accuracies with 
state-of-the-art but exhibits lower computational cost and storage 
requirements, factors highly important for big data analytics. 

Keywords —Image classification, saliency, folding, local binary 
patterns, support vector machines 


1. Introduction 

Recent advances in medical imaging devices have led to 
the generation of big image data on a daily basis. The main 
purpose of medical information system is the acquisition of 
necessary information to provide high-quality care through 
accurate and efficient diagnosis and treatment planning Q. 

In order to implement advanced information systems op¬ 
erating on large databases (hence handling big image data), 
suitable methods are required to respond to a query (an image 
selected by a clinician) by retrieving images that have similar 
characteristics. Content-based image retrieval (CBIR) uses im¬ 
age search techniques that incorporate visual features, such as 
color, texture, and shape, in order to respond to user’s queries. 
In medical imaging context, CBIR can enormously contribute 
to more reliable diagnosis, among others, by classifying the 
query image and retrieving similar images already annotated 
by diagnostic descriptions and treatment results. 

The main purpose of this work is to obtain high classi¬ 
fication score with less computational complexity and lower 
storage requirements. In order to save time and to gain high 
classification score, first a salient region detector is used O. 
Next, images are folded to mainly contain salient areas, and 
reduce the effect of irrelevant (non-salient) regions. Subse¬ 
quently, we can extract LBP features from folded images and 
classify them via SVM. We use IRMA x-ray dataset with 
14,410 images for training and testing. The classification result 


is computed with reported ImageCLEF error score evaluations 
for different methods 0. 

This paper is organized as follows: In section |n| a brief 
background review on medical image retrieval is given. In 


reports the experimental results using IRMA dataset. Section 
[V| concludes the paper. 

II. Literature Review 

There is a clear demand for fast and accurate image search 
technologies in clinical settings when physicians (e.g. radi¬ 
ologists) desire to search for similar images of all patients 
in the past when examining a current patient. Content-based 
image retrieval (CBIR) has been subject to research to satisfy 
some aspects of this demand. CBIR takes advantage of visual 
contents of an image such as colour, shape and texture to 
search for (similar) images in large archives. Generally, a 
software system that can access medical archives to search for 
similar images is a CBIR system. The “content-based” aspect 
of CBIR simply means that the search is conducted based on 
some visual (pictorial) features of the image, and not based 
on text annotations (the latter is mainly used when we search 
on the internet). Some examples for medical CBIR systems 
are TELEMED El, ASSERT (O and IRMA El. 

The features used in CBIR can be textual or visual. Recent 
medical image retrieval systems increasingly rely on visual 
features that could be low-level features (primitive), middle- 
level features (logical), and high-level features (abstract). 
Almost all early CBIR systems are based on low-level features 
(colour or shape), but recently, mid- and high-level image 
representations have received more attention. Mid-level fea¬ 
tures are obtained from particular parts of the image, which 
are important regions with significant details 171, El, O, (91. 
High-level features are represented with semantic design. The 
semantic design (high-level features such as emotions, objects 
and events) can be present in visual or textual information. 

Local binary patterns (LBP) are utilized as features for tex¬ 
ture description cni. LBP descriptors are commonly used in 
facial expression analysis and recognition ifTTl . ifTTl . (TTl . LBP 
measures invariant texture of gray-scale images with utilization 
of local neighborhoods. The basic LBP operator replaces pixel 
values with labels by binarizing 3x3 neighborhoods around 
each pixel with the centre pixel as a threshold. Pixel labels are 
then converted to decimal numbers. Because LBP is an easy- 
to-compute feature extraction method, it has been successfully 


section ^ we describe the proposed approach. Section IV 
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Algorithm 1 Proposed Approach 

1:-Pre-Processing- 

2: Read all images li 
3: Calculate saliency template S* 

4: if ^ Apply folding on all images li 
5: Save S'*, and all folded images if 

6:-Training- 

7: Read folded images if 
8: Set number of classes Nc 
9: Extract EBP features from folded data 
10: Train SVM to generate the support vectors vi, V 2 ,... 
11: Save vi, V 2 ,... 

12:-Online Classification- 

13: Read the query image 

14: Read the saliency template S* 

15: Read the support vectors vi, V 2 ,... 

16: ^ Apply the saliency template S* on I^ 

17: If ^ Apply folding F on If 
18: Extract EBP features from If 
19: Classify the query using SVM 


used in many studies such as face recognition and image 
annotation HD, iMi, ns, ca, El. In the proposed method, 
EBP is applied to multi-block patches in the image at different 
scales. After labeling the image parts, the feature histogram 
is extracted from the local region labels. The regions can be 
rectangular, circular or triangular. Recently, a new approach to 
binary encoding of local image information is proposed which 
uses “barcodes” based on thresholding projections via Radon 
transform El. 

Different methods can be used to classify images El, EqI, 

ED, ll2^ . Eor our classification, we use SVM in this paper, 
which is a supervised learning method to classify datasets. 
It investigates sets of feature vectors in an N dimensional 
space. It uses support vectors to construct a hyperplane to 
separate different classes by maximizing the margin between 
them defined by the given hyperplane 1^ . 

III. The Proposed Approach 

In this section, we present the proposed image classifica¬ 
tion method. This approach comprises of a pre-processing 
phase, offline training and an online usage phase. During pre¬ 
processing, saliency maps are extracted and images are folded. 
SVM is trained using EBP features of both folded and not 
folded images in the offline phase. Einally, online classification 
is described. Algorithm gives a generic overview of the 
proposed approach. 

A. Preprocessing 

The pre-processing of image data mainly consists of two 
procedures. The first procedure creates a saliency template, 
and the second procedure formulates the image folding based 
on the saliency template. 


Algorithm 2 Pre-Processing Stage: Saliency Template S'* 
1: Nc ^ number of classes; i = 1. 

2: Initialize saliency template S* = [] 

3: while i < Nc do 

4: Calculate the saliency map Si for image li O 

5: S* ^ S* -h Si 

6: i i — i -|- 1 

7: end while 


1) Saliency Map: The detection of salient regions of an 
image is crucial to extract effective information. We propose 
to create a saliency template by averaging all saliency maps 
which are detected by context-aware saliency algorithm O. 

The context-aware saliency algorithm detects image regions 
that best represent the “scene”. It is a detection algorithm, 
as its authors state, “based on four principles observed in 
the psychological literature: local low-level considerations, 
global considerations, visual organizational rules, and high- 
level factors”. Local low-level factors (such as contrast and 
color), global calculations suppressing frequently-occurring 
features, visual organization rules (visual forms may possess 
one or several centres of gravity) and high-level factors (such 
as priors on the salient object location and object detection) 
are considered by the algorithm. The implementation of this 
algorithm is available on the authors’ websit^ 

Saliency maps of all training images are generated and 
averaged to calculate a saliency template S'* (Algorithm [^. 
Eigure shows three images, their saliency maps, and the 
saliency template created by averaging all saliency maps. The 
average of saliency maps is first calculated internally within 
each class, then the average is taken across all classes. 

The salient, less salient and not salient areas are defined 
for training data by dividing images to N sub-blocks. Then, 
based on the saliency template, the folding is applied. The new 
images with reduced area can now be used for local pattern 
analysis. 

2) Image Folding: Eolding the rectangular region A C I 
within image / resulting in an image I' C I can be given 
through I' = A F I\A whereas the sign “\” denotes the 
set-theoretical subtraction. The main purpose of folding is to 
reduce image area without loosing information but reducing 
the dimensionality of features (see Eig. [^. The folding steps 
are described in Algorithm 

B. Offline Training 

EBP features are extracted from K (M > K) divided sub¬ 
blocks of image with different scaling factors (1 and 2). EBP 
feature vector for an image has 1,062 dimensions with the 
following condition: M = 4x4,iT = 3x3. The EBP 
histogram features from training data are used to train multi¬ 
class SVM 1^ to classify images. The SVM kernel type is 

^http://webee.technion.ac.il/labs/cgm/Computer-Graphics- 

Multimedia/Software/Saliency/Saliency.html 
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Fig. 2. Schematic illustration of saliency maps and image folding: The input image (left image) is processed to hnd a salient region (middle image). 
Subsequently, non-salient regions (right image, gray stripes) are marked to be folded inwardly. 




Fig. 1. A saliency map is generated for each available training image. A 
saliency template is then assembled by combining all saliency maps. 


set to be Radial Basis Function. 

C Online Classification 

In online part, an image query is selected from IRMA ||6l 
test database and LBP features are calculated for the saliency- 


Algorithm 3 Pre-Processing Stage: Image Folding 
1: Set number of blocks M (= N x N = 4 x 4) 

2: Read saliency tempalte S'* 

3: Read the input image / 

4: while not all combinations tested do 
5: Align two columns 

6: Take the summation of all pixel values in S* 

7: Keep (maximum value of summed columns) 

8: Update 

9: end while 

10: while not all combinations tested do 
11: Align two rows 

12: Take the summation of all pixel values in S* 

13: Keep Smax (maximum value of summed rows) 

14: Update ^ Ei ^max. 

15: end while 

16: Find the folding Fbest that satisfies 
17: s = min«r”,5™rx)- 

18: Apply the folding Fbest to /. 


based folded image as new images are encountered. Next, 
SVM classification is performed with LBP features. We also 
run the experiments for the LBP-SVM without folding. 

IV. Experiments and Results 

A. Data Set 

The Image Retrieval in Medical Applications (IRM7«0 
database is a collection of 14,410 x-ray images that have been 
randomly collected from daily routine work at the Department 
of Diagnostic Radiology of the RWTH Aachen Universit}^ 
The downscaled images were collected from different ages, 
genders, view positions, and pathologies 0. 

Each image in the dataset has an IRMA code. According 
to these codes, 193 classes are defined. The IRMA code com¬ 
prises four axes with three to four positions each: 1) the tech¬ 
nical code (T) (modality), 2) the directional code (D) (body 
orientations), 3) the anatomical code (A) (body region), and 

^http://irma-project.org/ 

^http://www.rad.rwth-aachen.de/ 
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4) the biological code (B) (the biological system examined). 
The complete IRMA code consists of 13 characters TTTT- 
DDD-AAA-BBB, with each character in {0,..., 9; a,..., zj. 
As many as 12,677 images are separated for training. The 
remaining 1,733 images are used as test data. 

Figure shows some samples images from IRMA dates 
along with their corresponding IRMA codes. 

B. Error Measurement 

The ImageCLEF project has defined an error score evalua¬ 
tion method in order to evaluate the classification performance 
of methods on IRMA dataset m As in IRMA dataset all 
images are labelled with the technical, directional, anatomical 
and biological independent axes, the error E can be defined 
as follows 

n 

^ = (1) 

i=i * 

where hi is number of possible labels at position i and S 
is the decision function delivering 1 for wrong label and 0 
for correct label when the IRMA codes of the image li is 
compared with the IRMA code of the image J^. For every 
axis, the maximal possible error is computed and the errors 
are normalized between 0.25 and 0. If all positions in all axes 
are wrong, error value is 1. 

C. Classification Error 

The experiments resulted in an error score of 153.07 for 
the proposed method of SVM image classification with multi¬ 
scale LBP on saliency-based folded image. If images are not 
folded, the SVM error slightly decreases to 146.55. This slight 
decrease in error comes with a higher cost in computation; 
the dimensions of features are twice the dimensions of the 
folded image. This means that the accuracy does not fall while 
time and computational cost are decreasing. Saliency-based 
folding reduces complexity without loosing important patterns 
in salient region. The computational complexity decreases 
because folding reduces the feature vector dimension. 

Without consideration of salient area, folding was tried in 
different directions. The error is clearly increased, apparently, 
saliency template plays a crucial role in deciding how to fold 
an image. 

For sake of comparison, the IRMA dataset was used in 
ImageCLEF 2009 competition with 2008 IRMA code and 
basic LBP with 4x4 multi-blocks is applied in ini and the 
error score is reported as 261.2 El. In addition, the lowest 
error score in ImageCLEF 2009 with 2008 IRMA code is 
169.5 0. The comparison of classifiers and SVM results are 
outlined in Table [T] 

D. Memory and Time 

The image area is reduced by 50% with saliency-based fold¬ 
ing. As an effect, the number of feature dimension decreased 
from 1,888 to 1,062 which equals 44% decrease in feature 
dimensionality. 


SVM needs 141.17 seconds training time and 92.51 seconds 
testing time without saliency-based folding. That corresponds 
to 53 milliseconds per image for online queries. 

In contrast, with saliency-based folding SVM only needs 
60.36 seconds training time and 52.56 seconds testing time. 
That corresponds to 30 milliseconds per image for online 
queries. To neglect the overhead for the saliency calculations, 
and only by looking at the testing times (online execution), 
using the proposed approach accelerates the classification 
process by roughly 43% when looking at online computation 
times per query. 


Method 

Error 

t (ms)/image 

LBP/SVM 

146.55 

53 

LBP/SVM w. folding 

153.07 

30 

TAU El 

169.5 

- 

VPASabanci QtI 

261.2 

- 


Table 1. Image classification results = nxn multi scale, t= time), 

results of TAU and VPASabanci as reported in literature. 


V. Conclusions 

Content-based image retrieval (CBIR) depends on good 
classification first to assign a query to a the right image 
category. The time requirements become paramount hone 
dealing with big data. 

The proposed medical image classification using saliency- 
based folding method appears to be an effective method when 
support vector machines and local binary patterns are em¬ 
ployed. Folding non-salient (non-relevant) parts of the image 
may result in slight increase of classification error. That may 
be expected since folding areas overlap with salient regions 
resulting in slight distortion. However, the proposed approach 
does accelerate the online classification, an advantage that 
might be crucial for big image data (reduction from 53 
millisecond per image to 30 milliseconds corresponding to 
43% acceleration). 

The decision how to fold image blocks is the most crit¬ 
ical part of the pre-processing. Different approaches can be 
examined in future work to investigate the feasibility and the 
potential effect of folding blocks and not necessarily just fold¬ 
ing rows and columns. As well, one may consider the deletion 
of non-salient blocks altogether. This may be particularly of 
interest in non-medical cases where the scene may contain 
irrelevant information along with objects of interest. 

As a potential future work, one may also investigate the in¬ 
corporation of the new barcode technology El into retrieval- 
oriented classification combined with optimization techniques 
that employ the concept of opposite entities 1^ . 1^ . El 
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(a) 1121-127-700-500 (b) 1121-120-918-700 (c) 1121-120-942-700 (d) 112d-121-500-000 (e) 1123-127-500-000 



(f) 1121-120-200-700 (g) 1121-200-412-700 (h) 1121-110-414-700 (i) 1121-240-442-700 G) 1121-220-310-700 

Fig. 3. Sample images from IRMA Dataset with their IRMA codes TTTT-DDD-AAA-BBB. 
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